Theory of spike timing based neural classifiers 
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We study the computational capacity of a model neuron, the Tempotron, which classifies sequences 
of spikes by linear-threshold operations. We use statistical mechanics and extreme value theory to 
derive the capacity of the system in random classification tasks. In contrast to its static analog, the 
Perceptron, the Tempotron's solutions space consists of a large number of small clusters of weight 
vectors. The capacity of the system per synapse is finite in the large size limit and weakly diverges 
with the stimulus duration relative to the membrane and synaptic time constants. 
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Neural network models of supervised learning are usu- 
ally concerned with processing static spatial patterns of 
intensities. A famous example is learning in a single- 
layer binary neuron, the Perceptron [J 0|. However, in 
most neuronal systems, neural activities are in the form 
of time series of spikes. Furthermore, stimulus represen- 
tation in some sensory systems are characterized by a 
small number of precisely timed spikes [3, 4], suggesting 
that the brain possesses a machinery for extracting in- 
formation embedded in the timings of spikes, not only 
in their overall rate. Thus, understanding the power 
and limitations of spike-timing based computation and 
learning is of fundamental importance in computational 
neuroscience. Giitig and Sompolinsky have recently 
suggested a simple model, the Tempotron, for decoding 
information embedded in spatio-temporal spike patterns. 
The Tempotron is an Integrate and Fire (IF) neuron, 
with N input synapses of strength w^, i — 1, . . . ,N. Each 
input pattern is represented by N sequences of spikes, 
where the spike timings for the afferent i are denoted by 
{U}. The membrane potential is given by 
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where u{t) denotes a fixed causal temporal kernel. An 
example is the difference of exponentials form: u (t) = 

uq ^e~'>n — e~^^, where t„i and Ts correspond, respec- 
tively, to the membrane and synaptic time constants 
Q. The Tempotron fires a spike whenever U crosses the 
threshold, Uth, from below (7| (Fig. [T^). The Tempotron 
performs a binary classification of its input patterns by 
firing one or more output spikes when presented with a 
'target' (+1) pattern and remaining quiescent during a 
'null' (-1) pattern. 

In this Letter we present a theoretical study of the 
computational power of the Tempotron. We focus on the 
standard task of classifying a batch of P = aN random 
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Figure 1. (a) Example of voltage traces U{t). (b) Probability 
density of the rescaled maximal potential x as defined in eq. 
(|4]) with a fitted scale factor /?. (c) Probability of A'^spikcs. 
The Line in (b) is a standard Gumbel law. In (c) circles 
indicate the theoretical Poisson law. Data was measured with 
K = 400, a = 1.68, = 500 and 34 samples. 



patterns, where a denotes the number of patterns per 
input synapse. For each pattern, the timings of the in- 
put spikes from each input neuron are randomly chosen 
from independent Poisson processes with rate y, where 
T is the duration of the input patterns, and the desired 
output, y — ±1, is randomly and independently chosen 
with equal probabilities. A solution to the classification 
problem is a set of synaptic weights {uJi} that yields a 
correct classification of all P patterns. We will address 
several fundamental questions. First, numerical simu- 
lations based on a simple error-correcting on-line learn- 
ing algorithm suggest that the capacity of the IF neuron 
namely, the maximal number of patterns per synapse, ac, 
which, with high probability (approaching 1 for large N) , 
can be correctly classified is independent of the number 
of input synapses [s]; however, an analytical proof for 
this property has been lacking. Secondly, it is important 
to understand how the computational capabilities of the 
neuron depend on the various time scales in the dynamics 
of the system. Finally, our study highlights the complex 
geometric structure of the space of solutions for a < ac, 
similar to the one arising in other hard computational 
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problems, such as learning in multilayered neural net- 
works [sj] or random combinatorial optimization j^, [lo| . 

Our theoretical analysis, presented below, shows that 
a fundamental parameter is the pattern duration, T, rel- 
ative to the neural time scales. 



(a) 

3.5 



K 



(2) 



The properties of the Tempotron can be most easily un- 
derstood, when both N and K are large, with N ^ K. 
This limit is biologically sensible if we consider a neuron 
with N ~ 10^ synapses, inputs that are presented for 
T ^ 100 — 1000 milliseconds, and constants ^ 1 — 10, 
Tm ~ 10 — 100 milliseconds. We predict that, for any 
fixed K, the capacity is independent of N in the large N 
limit. Furthermore, the capacity grows with K as 
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The convergence of the capacity to this expression is slow, 
requiring that y/lnK ^ 1. Nevertheless, this result has 
several qualitative implications. Equation Q implies 
that the capacity of the Tempotron is not bounded as 
K increases, and may exceed the capacity of the well- 
known Perceptron model {ac — 2 [^) whose architecture 
is similar to the Tempotron. Note that when K is O (TV), 
the few input spikes that arrive within a single decision 
time window, T/K , do not carry sufficient information to 
classify the patterns. We therefore expect that for any 
fixed N, OLc is a non monotonic function of K while the 
value of K that maximizes the capacity increases with 
iV, as implied by ([S]). This prediction is corroborated by 
numerical simulations in Fig. [2^. Interestingly, accord- 
ing to eq. ([2), the performance should be sensitive also 
to the short time behavior of the kernel as confirmed by 
the simulations of Fig. [2]d. This short time behavior 
determines how fast can the membrane potential change 
significantly. The faster this cange can be, the easier it is 
to distinguish between inputs that arrive within a short 
interval of time. 

In the Perceptron model, the solution space for a given 
classification task is a convex volume, which shrinks in 
size and ultimately vanishes as a approaches the capac- 
ity, ttc. The overlap between two typical solutions, 
defined by the inner product between their normalized 
weight vectors, approaches 1 at the critical capacity 
Our theory reveals that the solution space of the Tem- 
potron is of a strikingly different nature. First, the over- 
lap between two Tempotron weight vectors that solve 
the random classification problem, goj approaches zero 
in the K ^ \ limit, for every a < Secondly, the 
solution space is connected for small a only. For larger 
values of a, still far below capacity, the solution space 
breaks into a large number of small disconnected clus- 
ters, spread across the entire weight space. The overlap 
between solutions within the same cluster, q\. is close to 
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Figure 2. (a) Capacity of the Tempotron vs. K. Lines 
with symbols show results from the learning algorithm of 
The solid line shows the large- A" theory ((3|, with an additive 
constant (a^ — (InlnA") /21n2 -f qq, with ao = 2.58) fitted 
to the predictions of the replica method for the discrete Tem- 
potron for indiscrete ^ 2,3,4 (x symbols) . To compare the 
theory of the discrete Tempotron with the simulation results 
of the continuous time Tempotron we used T^discrotc _ ^'^g 
(b) Distribution of learning times for different Ts, and for fixed 
T,n ~ T/25, a = 2.6 and A'' — 1000. As Ts decreases so does 
the mean learning time, indicating that ja — ad has increased, 
as predicted by eqs. ^ and 
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Figure 3. (a) Overlap between randomly chosen solutions, go, 
at a = 2 as a function of K. (b) Auto-correlation function 
gAc(At) = (d)(f) ■ (b{t + At)) of a random walk inside a con- 
nected volume in solutions space for K = 150 and A'^ = 500. 



1. while two randomly chosen solutions are likely to lie 
in different clusters and have overlap go ~ 0. Simula- 
tions making use of the learning algorithm of 0| support 
this picture. The overlap between two solutions obtained 
from two different initial weight vectors vanishes for all 
values of a (Fig. [3^). To probe the overlap between so- 
lutions in the same cluster, we performed a random walk 
in solution space [llj . starting from a solution found by 
the Tempotron learning algorithm and rejecting the ran- 
dom walk step attempts if they lead to a weight vector 
that is not a valid solution. The auto-correlation function 
of this random walk drops exponentially fast to zero for 
small a, indicating that the solutions space is connected, 
and hardly decays for higher a(< etc), as expected for a 
clustered solution space, (Fig. [5)d). 

The above results are surprising and counter-intuitive 
since they imply that even close to capacity, IF neurons 
with very different weights can perform exactly the same 
classification, whereas IF neurons with high degree of 
similarity in their weight vectors will typically fail to solve 
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Figure 4. (a) Time traces of the potentials of a Tempotron 
with random weights, Ui{t) (bold line), and of seven other 
Tempotrons, (72 (t) (gray lines), having overlap q — .8 with 
the first one. The pattern is the same for all Tempotrons, and 
is classified as +1 by the first Tempotron: Ui (t) is maximal 
in ti, and exceeds Uth- The error bar is centered in t = 
ti,U2 — qUi (ti), and has height ^1 — g^. Parameters are 
K = 100, iV = 1000. (b) Probability that 2 neurons will 
classify a random pattern in the same manner, Pequai, vs. 
overlap between their weight vectors, q, for the Perceptron 
(theory and simulations in black) , the Tempotron (blue x and 
+ symbols correspond, respectively, to K = 100 and K' — 
4:K — 400) and the Hodgkin-Huxley (red squares and circles 
correspond to T = 1.5 sec and T' = 4T = 6 sec respectively 
[13]) models. 

the same task. To understand these properties we con- 
sider a Tempotron whose N weights are random variables 
drawn from any probability distribution with finite first 
two moments. With no loss of generality we may choose 
the mean and variance of the weights to ensure that U{t) 
has zero mean and unit variance. The threshold poten- 
tial, Uth, is such that a random pattern is classified by 
each Tempotron as ±1 with equal probabilities, i.e., Uth 
is the median value of the distribution of the maximum 
of U{t) over time, t/max- The synaptic potential U{t) 
induced by a random input pattern approaches, in the 
large N limit, a temporally correlated Gaussian distribu- 
tion. We use extreme value theory (EVT) of Gaussian 
processes to evaluate the statistics of C/max ■ Accord- 
ing to EVT, J/max can be written as 

C/„,ax-(7th + /3(a; + lnln2) (4) 

where x obeys the Gumbel density distribution G{x) — 
exp {—X — exp(— x)), whose median is — lnln2. The scale 
factor is(3 ^ l/V2]nK + (1/ In if) and the threshold is 

Uth - V2hl< + O (l/N/hTx) , where K = T^ 

and C{t) — {U (t') U {t' + t)) is the auto-correlation func- 
tion of U{t). These results are valid provided that C{t) 
decays to zero at long times and K is large Note 
that for a kernel u{t) in eq. ([T]) of the form of difference 
of exponentials, K takes the value of eq. ([2]). 

We now consider two such Tempotrons, with an over- 
lap q between their two weight vectors. Let us choose a 
pattern that is classified as -1-1 by the first and denote by 



ti the time at which its potential reaches its maximum 
value Ui > Uth- Let us denote the postsynaptic potential 
of the second Tempotron at time ti by U2. Conditioned 
on Ui , the probability distribution of U2 is Gaussian with 
mean U2 = qUi and standard deviation a — -^/l — g^. 
According to ([!]) Ui is close to Uth, and we may ap- 
proximate Uth — U2 — (1 — q)\/2\nK. Thus, as long 
as 1 — q ^ , the typical fluctuations of U2 which are 
of 0{<j) are much smaller than the gap between U2 and 
the threshold (Fig. H^); hence U2 is very likely smaller 
than Uth- This implies that the overall probability that 
the second Tempotron's potential crosses the threshold 
at any time remains close to 1/2, unless 

Thus, two Tempotrons are likely to agree on their classi- 
fications of a random pattern only if the overlap in their 
synaptic weights is close to 1. This result is confirmed by 
the simulations shown in Fig. |3|d. We also present the 
simulation results for the Hodgkin Huxley model [13], a 
classical biophysical model for spike generation. Inter- 
estingly, despite its complex dynamics, the classification 
pattern of a pair of Hodgkin-Huxley neurons is similar 
to that of the Tempotron, indicating that this behavior 
does not depend on the details of the spike generation but 
on the summation of input spikes within temporal win- 
dows. In contrast, in the case of the Perceptron, which 
lacks temporal windows, the probability that two weight 
vectors agree on their classification increases roughly lin- 
early with their overlap, q (Fig. [Dd). The above result 
provides a qualitative explanation of the clustered nature 
of the solution space. Consider one solution to the classi- 
fication task. Very similar weight vectors, with overlaps 
larger than l — O (1/ In if) are likely to be solutions, too, 
and compose a very small connected cluster of solutions 
around the first solution. On the other hand having any 
positive overlap smaller than this scale, does not pro- 
vide significant advantage in terms of classification error. 
Hence, entropy pressure for decreasing the overlap wins, 
yielding a vanishingly small overlap go between two typ- 
ical solutions. 

The fact that Qq is small for all a has important conse- 
quences. First, qo in general measures the strength of the 
correlations between the solution weight vector and indi- 
vidual quenched learnt patterns. Small qo implies, there- 
fore, that the statistics of the potential after learning is 
approximately Gaussian with variance and mean which 
are governed by the requirement that random patterns 
induce spiking with probability ^. As described above, 
this implies that the distribution of [/max of learnt pat- 
terns has a Gumbel shape. Furthermore EVT predicts 
that the number of threshold crossings in a pattern of 
duration T, A'spikes, obeys a Poisson distribution with a 
mean rate r = consistent with a ^-probability of fir- 
ing within time T [l^. These predictions are confirmed 



4 



by numerical simulations shown in Fig. [T|3,c. 

EVT provides a basis for estimating the value of the ca- 
pacity. Drawing on analogy from the replica calculations 
([lil and below), we estimate the entropy of clusters in 
the solution space, Sd, through Sd = (InT^ — ln.Vci)/N, 
where V and Vd are, respectively, the total volume of 
solutions and the typical volume of one cluster. As 
go ~ 0, y is simply the product of the probabilities 
that the Gaussian potential U crosses the threshold for 
each +1 pattern and does not do so for each —1 pattern: 
V = (^) Assuming that the typical cluster is of 'com- 
pact' shape, its volume is given by Vd — (l—qi)^^^ where 
qi is the typical overlap between solutions within the clus- 
ter and scales according toeq. ([S]) as 1 — qi = 0(1/ In if). 
We therefore obtain. 



Sd — —In In /-C — a In 2 . 



(6) 



Classifications are possible as long as Sd > 0, which 
yields the capacity ([3]). 

The above results are supported by an independent 
statistical mechanical study of a simpler model, the dis- 
crete Tempotron [5|, Sup. Mat.], where time is discrete, 
t = £t, £ = 1,2,3,..., and the potential Ui is the sum 
of the synaptic weights coi, multiplied by the number of 
spikes emitted by input i in the time-bin £. The pat- 
terns to be classified are associated an internal represen- 
tation (IR) , which consists of the set of time-bin indices 
£ such that Ui > Uth- The weight vectors implement- 
ing the same IR form a convex domain of solutions. As 
the entire solution space is not expected to be convex, 
calculating its volume is a difficult task. Instead, follow- 
ing 14 1, we have calculated the average value of the 



logarithm of the number of typical implementable IR do- 
mains, SiB., as a function of a. The calculation, based 
on the replica method, involves two overlaps: the intra- 

overlap of a domain, and the inter-overlap between 

~ 1 

In if' 

and Sir given 
Hence vanishes as 



two domains, ^q^. When K = ^ ^ 1 and a ^ 



we find - 1 



by the right-hand side of ([5]). 
long as a ^ IniC, and the scaling of q{^ is compati- 
ble with qi given by EVT. This calculation also enables 
us to estimate the capacity at finite K (See Fig. [5^). 
The similarity between quantities defined in terms of con- 
nected clusters of solutions, and those defined in terms 
of IR domains is a consequence of the binary character 
of the overlaps in the large K limit. For the same rea- 
son, further effects of replica symmetry breaking should 
affect only subleading corrections to ac- Numerical sim- 
ulations show that the discrete Tempotron behaves very 



similarly to the continuous time Tempotron (Data not 
shown). This implies that the computational capability 
of the Tempotron is not sensitive to the detailed shape 
of the temporal integration. 

In conclusion, we have presented a theory of the com- 
putational capacity of a neuron that performs classifica- 
tion of inputs by integrating incoming spikes in space and 
time and generates its decision via threshold crossing. 
Importantly, the Tempotron is not constrained to fire at 
a given time in response to a target pattern. Thus, by 
adjusting the timing of its output spikes, the Tempotron 
can choose the spatio-temporal features that will trigger 
its firing for each target pattern. Despite the simplic- 
ity of its architecture and dynamics, this property of the 
Tempotron decision rule yields a rather complex struc- 
ture of the solution space and accounts for the superior 
performances of the Tempotron compared to the Percep- 
tron and to Perceptron-based models for learning tempo- 
ral sequences [Hi which specify the desired times of the 
output spikes. 
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